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Abstract 

Ecological studies involving counts of abundance, presence-absence or occupancy rates 
often produce data having a substantial proportion of zeros. Furthermore, these types of 
processes are typically multivariate and only adequately described by complex nonlinear 
relationships involving externally measured covariates. Ignoring these aspects of the data 
and implementing standard approaches can lead to models that fail to provide adequate 
scientific understanding of the underlying ecological processes, possibly resulting in a loss of 
inferential power. One method of dealing with data having excess zeros is to consider the class 
of univariate zero-inflated generalized linear models. However, this class of models fails to 
address the multivariate and nonlinear aspects associated with the data usually encountered 
in practice. Therefore, we propose a semiparametric bivariate zero-inflated Poisson model 
that takes into account both of these data attributes. The general modeling framework is 
hierarchical Bayes and is suitable for a broad range of applications. We demonstrate the 
effectiveness of our model through a motivating example on modeling catch per unit area 
for multiple species using data from the Missouri River benthic fish study, implemented by 
the United States Geological Survey. 
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1 Introduction 



The problem of having a large proportion of zero values is a common characteristic of data 
obtained from environmental and ecological studies involving counts of abundance, presence- 



absence or occupancy rates (Clarke and Green 1988 Welsh et al. , 1996 Martin et al. 2005 



Berry et al. , 2005). Ignoring or excluding values to facilitate the analysis of zero inflated 



data can result in a loss of important information and, thus, diminished explanatory power. 
For example, when studying abundance or presence-absence of a species in ecological studies, 
having a large proportion of zero values might be an indication that the species is rare or 
endangered, hard to detect, or both. The problem of dealing with rare species and species 
with a low probability of detection is extremely common in ecological studies and so data 
having a preponderance of zeros is often encountered. Thus, standard distributions such as 
Poisson, binomial and negative-binomial often fail to provide an adequate fit. On the other 
hand, one potentially appropriate class of distributions for describing this type of data is 
the class of zero-inflated distributions as they properly account for a large proportion of zero 



values (Cohen, 1963; Lambert, 1992 Johnson et al. , 1997). 

Although it is conceivable that data having excess zeros may come from any distribution, 
typically, in practice, the distributions are discrete. Therefore, several popular models that 
account for data with excess zeros have emerged, including the zero- inflated Poisson (ZIP), 
zero-inflated binomial (ZIB) and the zero-inflated negative binomial (ZINB). The ZIP model 
is especially useful in analyzing count data with a large number of zero observations. How- 
ever, in practice, the ZIB model is sometimes used for cases where an upper bound exists 
for the response whereas the ZINB model is sometimes used for cases where the data are 
overdispersed. Nevertheless, the ZIP model has experienced wide-spread popularity over the 



last decade and has been applied to numerous problems in horticulture (Hall, 2000), man- 



ufacturing (Lambert, 1992), and various other fields of study, including health operations 
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(I Wang et al.l 120021) , meteorology (iWikle and Andersonl 120031) , ecology (I Welsh et al.l 11996 



Martin et al. 


2005; 


Ver Hoef and Jansen 


2007 


), and fisheries biology ( 


Minami et al. 


2007 



Arab et al-l |2008[ [Wildhaber et al-l |2011[ ). 

Despite the fact that the utility of multivariate ZIP models is extensive, the relevant 
literature is somewhat limited. Early research focused on extensions of the univariate Poisson 



binomial (Skellam 1952), which is a compound distribution of the binomial and Poisson. 



However, more recently, Li et al. (1999) formulate an m-dimensional ZIP distribution by 



linking all of the univariate distributions together through one common distribution, as is 



done in the case of the m-variate Poisson distribution (Johnson et al. , 1997). Moreover, Li 



et al. (1999) focus on the bivariate case by deriving a bivariate ZIP (BivZIP) distribution 



as a mixture of two univariate Poisson distributions and a point mass at (0,0) (i.e., a point 



mass when both count values equal zero). In order to estimate these models, Li et al. 



(1999) use maximum likelihood. Extending this work, Majumdar et al. (2010) propose a 



Bayesian BivZIP regression model with estimation based on data augmentation. In contrast. 



Schmidt and Rodriguez (2011) discuss models for multivariate counts observed at fixed 
spatial locations based on a continuous mixture of independent Poisson distributions. 

We propose a semiparametric ZIP modeling approach for bivariate count processes which 
extends the existing bivariate zero-inflated modeling approaches to utilization of nonlinear 
covariates in the model as well as modeling zero-inflation probabilities through a multinomial 
logit regression. The modeling approach we propose produces a class of bivariate semipara- 
metric zero-inflated Poisson models cast in a hierarchical Bayesian framework. Critically, 
the semiparametric aspect of the proposed approach allows us to readily consider possi- 



ble nonlinear effects of covariates (Ruppert et al. , 2003, 2009) in a bivariate zero- inflated 



setting. Finally, although the modeling framework introduced in this paper is extremely 
natural for environmental and ecological applications, the only example of such a bivariate 



semiparametric modeling technique in the literature is Arab (2007) 
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The models we propose could potentially be considered from either a classical or Bayesian 
perspective; however, as the level of complexity increases, it is convenient (and often neces- 
sary) to make use of the Bayesian paradigm. In this context, accounting for uncertainty in 
different levels of the model can be effectively facilitated through using a hierarchical mod- 
eling framework. For a comprehensive discussion on hierarchical models for environmental 



and ecological data see Wikle (2003), Royle and Dorazio (2008), Cressie and Wikle (2011) 



and the references therein. 

Our approach is extremely useful for developing models of abundance in settings with 
multiple species, where univariate distributions are less suitable. For example, ichthyology 
(i.e., fisheries biology) is one area of ecology where modeling counts of abundance of multiple 
species is prevalent. Frequently, in this context, species are biologically related and thus it is 
expected that relative abundance, as a function of habitat, year, gear (i.e., type of net) used 
to catch the fish, etc., will be correlated. Therefore, a potentially advantageous approach for 
modeling relative abundance is to borrow strength across related species through the use of 
multivariate distributions. 

We demonstrate the effectiveness of our methodology through a motivating example 
in ichthyology, namely modeling abundance of species in a bivariate setting using fish catch 
data. Here, using benthic fish data collected by the United States Geological Survey (USGS) 



on the Missouri River (Berry et al. , 2005), we illustrate the usefulness of our framework by 
modeling the catch per unit area (CPUA), while determining which factors are related to 
the zero-inflation probability for a given fish species and which factors are related to catch 
rates. Critically, these goals are accomplished while accounting for the dependence between 
different species. 

The remainder of this paper is organized as follows. Section |2] describes our motivating 
example, the Missouri River benthic fish study. Semiparametric BivZIP models are presented 
in Section |3| Section |4] applies the proposed model to our motivating example, modeling 



CPUA for benthic fish on the Missouri River. Finally, Section [5] contains discussion. Deriva- 
tion of all full conditional distributions and details surrounding our Markov chain Monte 
Carlo (MCMC) algorithm are left to the Appendix. 



2 Missouri River Benthic Fish Study 

In 1995, USGS and the Montana Department of Fish, Wildlife, and Parks commenced a 
study to look at benthic fishes in the warm-water portion of the Missouri River system 



(Berry and Young, 2001 Berry et al. , 2005). The Missouri River (USA) extends 3,764 



kilometers from southwest Montana to the Mississippi River and contains several species of 
benthic fish (Figure [T|. Benthic fish are fish that live or feed on the bottom of the river and 
are of particular interest because of their sensitivity to changes in habitat. The main goal 
of the study was to evaluate the status, distribution and habitats associated with various 
benthic fish in the Missouri River for the purpose of providing vital information necessary 
for improvement in their management. 

Included in this study were 26 different benthic fish species; however, in Section |4] we will 
focus only on two species, common carp {Cyprinus carpio) and channel catfish {Ictalurus 
punctatus) which are two generalist species with overlapping habitat associations. In this 
study, fisheries biologists divided the Missouri River into three zones and each zone was 
divided into segments; the upper zone or "least-altered zone" (LA) included segments 3, 5, 
and 9, the middle or "inter-reservoir" zone (IR) included segments 7, 8, 10, 12, 14 and 15, 
and the lower or "channelized zone" (CH) included segments 17, 19, 22, 23, 25 and 27 (see 
Figure [l] and Wildhaber et al. (2011) for further details). Although there were twenty seven 



segments included in the study design, due to financial and administrative constraints, only 
fifteen segments were sampled during each of the three years of the study considered here. 
Common to fisheries field studies, the data considered here are based on multiple gears in 
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addition to consisting a large proportion of zeros, thus complicating analysis. Using standard 



parametric statistical methods on data from each gear separately, Berry et al. (2005 ) excluded 
several river segments and macrohabitats from the analysis due to a high percentage of zero 
observations (i.e., violation of normality and homogeneity of variance assumptions). As a 
result, the previous analysis may have imposed limitations on the inferential scope, making 
comprehensive conclusions about the complete study domain inaccessible. In particular. 



the analyses conducted by Berry et al. (2005) were hmited due to several factors. First, the 



authors aggregated the data to larger spatial and temporal scales in order to alleviate a large 
percentage of zero observations and thus potentially caused a loss in information. Second, 
separate analyses were conducted for each gear rendering inference on gear intractable. In 
addition, because this study considered each species separately, using a univariate analysis, 
the ability to draw conclusions regarding correlation between various species was necessarily 
limited. 

Our goal is to develop and implement a modeling framework which will allow for mean- 
ingful ecological interpretations based on the model results while concurrently raising the 
predictive precision of the model in the presence of realistic constraints. Ultimately, similar 



to the zero-inflated modeling approach discussed in Wildhaber et al. (2011), our approach 
can identify the type of gear, macrohabitat, segment, year and physiochemical characteristics 
that explain where certain species are most likely to populate. The approach we propose for 
this data is a hierarchical Bayesian semiparametric BivZIP model. Specifically, the model 
we develop incorporates those parameters that help explain the mean fish count as well 
as those that explain the zero- inflation probability (i.e., excess zero observations), while 
accommodating nonlinear relationships and borrowing strength across "similar" species. 
The data used in our analysis was collected over the three study years 1996 to 1998 



see 



Berry et al. (2005) for a comprehensive discussion). Channel catfish and common carp 



were chosen because, in part, they reflect two commercially important species that can be 



found in many habitats within different rivers, including the Missouri River. Biologically, a 
study of these two species, in a bivariate setting, is of interest due to overlapping habitat 



use (Berry et al. , 2005). Based on criteria similar to Berry et al. (2005), we excluded several 
outlier observations as well as data for segments 7, 8, 10, 12, and 14 from the analysis, 
due to extremely low levels of catch or none at all, for at least one of the two species. Even 
though this resulted in an analysis that only used 960 of the 1477 total observations available. 



our analysis is still able to include more segments than the previous analysis of Berry et al. 



(|2005|), since our modeling approach directly accommodates a significantly higher percentage 
of zero values than models following standard distribution?^ 



Although, the resulting data contains many observations greater than zero it also contains 
a substantial portion of zero observations. Specifically, after removing the aforementioned 
segments, the observations for channel catfish and common carp included 51% zeros and 
67.7% zeros respectively. In the bivariate setting, the percentage of zeros for both species is 
39.7%, approximately 28% of the samples include a non-zero catch for channel catfish and 
zeros for common carp, approximately 11% of the samples include zeros for channel catfish 
and a non-zero catch for common carp, and finally 21.25% non-zero observations for both 
species. 

Based on biological considerations, ecologists expect that these two species use similar 
habitats and, as such, it is sensible that abundance of each species, as a function of gear, 
segment, macrohabitat and year, should be related. In addition, exploratory data analysis, 
combined with current species specific knowledge, suggests that some of the explanatory 
variables are nonlinearly related to the Poisson log-intensity parameters. Consequently, 
there is sufficient motivation for considering a semiparametric BivZIP modeling approach in 
this context. 



In a univariate setting, Berry et al. (2005) excluded segments 3, 5, 7, 8, 10, 12, and 14 for channel catfish. 
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3 Semiparametric Bivariate Zero-Inflated Models 



It is natural to model correlated counts using a bivariate discrete distribution such as a 
bivariate Poisson distribution. However, the computational difficulties involved in fitting 
such models have traditionally deterred researchers from using such an approach. Recent 
advances in hierarchical Bayesian modeling and, specifically, the improvement of compu- 
tational methods such as MCMC, have provided mechanisms for easy implementation of 



bivariate discrete distributions such as the bivariate Poisson (e.g., see Tsionas (2001), Nt 



zoufras (2009) and Majumdar et al. (2010)). Although the bivariate Poisson distribution 



can be formulated from several directions ( [Kocherlakota and Kocherlakota , 1992 Schmidt 



and Rodriguez, 2011), the formulation chosen here is a natural extension to the univari- 



ate Poisson distribution that allows for correlation among the response variable for the two 



populations under consideration (Li et al. , 1999). 



Similar to Li et al. (1999), we let Yij and denote the j-th observation from the first 
and second population, respectively. Then, for j = 1, 



Y 



Y, 



2i 



Zij + Z-ij, 

Z2j + Zsj, 



where (Yij,Y2jy ~ BivPois(Aij, \2j, Asj) and Zij, Z2j and Z^j are mutually independent 
Poisson random variables with intensity parameters Xij, \2j and Asj, respectively (Kocher- 
lakota and Kocherlakota, 1992 Johnson et al. , 1997). Assuming Yij and Y2j are variables 



from a bivariate Poisson distribution, the covariance between Yij and Y2j is given by 

cov(Yij, Y2j) = cov{Zij + Z^j, Z2j + Z^j) = vai^Zsj) = X^j, 
and, thus, the correlation coefficient between Yij and Y2j is 



corr(yij,F2i. 



V i^lj + '^3i)(A2j + hj] 
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The joint probabihty mass function is given by 



P(n, = I2, = y,,) = exp{-(A,, + X,, + A3,)} g ^^^^ _ _ . 



^Ij 



To construct a BivZIP model, we consider a mixture of a point mass at (0,0), two 
univariate Poisson random variables, and a bivariate Poisson random variable. Then 



(0,0) w.p. poj, 

(Pois(Aij + A3j),0) w.p. pij, 

(0,Pois(A2j + Asj)) w.p. p2j, 

BivPois(Aij, Aaj, Asj) w.p. p^j. 



(2) 



where "w.p." denotes a shorthand for "with probability" and poj 



Plj -P2j -P3j), 



denotes the probability that observations follow a bivariate Poisson distribution with 
joint probability mass function defined in ([T]). Henceforth, we say that {Yij,Y2jy ~ 
BivZIP(Aij, A2j, Xsj , Plj , p2j , Psj) if {Yij,Y2j)' follows the distribution defined by ([2]). For 
a comprehensive discussion on bivariate zero-inflated Poisson models (and extensions to 



multivariate cases) see Li et al. (1999) and the references therein. 

Semiparametric models provide an extremely versatile tool for describing nonlinear rela- 
tionships and, thus, have become increasingly more prevalent among various scientific dis- 



ciplines (e.g., Fahrmeir and Echavarria 2006 Lam et al. 2006 Chiogna and Gaetan 2007 



Dagne 2010 


Musio et al. , 


2010 


Liu et al. 


2010) 



to modeling bivariate zero-inflated count data, although providing a natural framework for 
modeling many nonlinear ecological and environmental phenomena, remains undeveloped. 
Therefore, even though we are motivated by a specific application (relative abundance for 
multiple species), the methodology proposed here is of independent interest. 

The semiparametric modeling framework we consider uses general spline based nonpara- 



metric regression for univariate predictors (Ruppert et al. , 2003, 2009). More specifically. 



we consider multinomial logit models for the mixture probabilities (e.g., see Fahrmeir and 



Tutz, 2001) and semiparametric regression models for the logarithm of the latent Poisson 
intensity parameters. In general, the models we propose for the intensity parameters and 
mixture probabilities can be expressed as 

log(Afj) = Pi^o + Pi^iXij H ^ Pe 

\og{prj/P0j) = IrO + IrlXlj + 'Jr2X2j ^ ^-Irgr^g^j, (4) 

where £ = 1, 2, 3, r = 1, 2, 3 and eij are assumed to be i.i.d. N{0,a'^J and the intercepts 
can be considered random effects that help account for uncertainties arising from sampling 
errors and covariates potentially missing from the analysis. Additionally, each function fei{-) 
is an unknown smooth function that is assumed to be approximated sufficiently well using 
a penalized spline. In particular, the smooth functions fei{-) in ^ are based on thin-plate 
splines and can be generically written in the form 

K 

f{x\-)=(3x + '^bk\x-Kk\^, (5) 

k=l 

where bk ~ N{0,aD and /ti < ^2 < ■ • ■ < denote fixed knot points. Here, we focus on 
thin plate splines because of their good numerical properties and note that other orthogonal 
basis functions could also be used in this context. 

For ease of exposition, we temporarily suppress the dependence on i and consider a 
special case of ([3]). In particular, let 

Xj = log(Aj) = (3o + PiXij + /(x2j) + Sj (6) 



and 



X - [1 xij a;2j]i<j<„; — [\x2j - K,k\^]i<j<n- 



l<k<K 



Further, let 



l<k,k'<K 



and A = (Ai, . . . , A„)', then the penahzed sphne regression is obtained by minimizing 



(7) 



where /3 = (/3o, /32)', b = (61, . . . , 6^)' and b corresponds to a fixed smoothing parameter 
(i.e., penalty parameter). Additionally, let f3 be fixed and b random, with E(b) = 0, 
cov(b) = cr^ri^"'^, where = Sa^. As long as (b', e')' is normally distributed, where 
£ = {ei,e2, ■ ■ ■ ,Sn)' 1 with b and e independent, one can obtain an equivalent mixed model 



representation of the penalized spline (Brumback et al. 1999); see Crainiceanu et al. (2005), 



Gimenez et al. (2006), Holan et al. (2008) and the references therein for complete details. 
Specifically, the P-spline representation of the generalized linear mixed model (GLMM) is 
given by 



A = X/3 + Zxb + £, 



(8) 



with 



cov 








Again, following 



Crainiceanu et al. 



( |20()5| , define Z = Zk^]^'"^ and b = fl^^^^u the equiv- 
alent P-spline model, for the log-intensity parameters, in the form of a GLMM can be 
expressed as 



A = X/3 + Zu + £, 



(9) 



with 



cov 



Although it is possible to write out (|3]) explicitly in terms of its equivalent mixed model 
formulation, we do not pursue this general exposition for the sake of brevity. Additionally, 
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inclusion of spatial effects is straightforward and simply amounts to including a bivariate 
radial basis smoother (or basis corresponding to a proper spatial covariance) in ^ (i.e., a 



geoadditive model); see Kammann and Wand (2003) and Holan et al. (2008) for complete 
details. Discussion of specific models in the context of our motivating example is deferred 
until Section m 

One method of estimating this model is known as penalized quasilikelihood (PQL) and 



constitutes an approximation to the full likelihood (Breslow and Clayton 1993 Ruppert 



et al. , 2003). Another method for fitting generalized linear mixed models is to adopt a 



Bayesian approach and use Markov chain Monte Carlo (Robert and Casella 2004), which is 



the direction we pursue; see Zhao et al. (2006) for a comprehensive discussion 



Using ^ we propose hierarchical Bayesian semiparametric BivZIP models. Let [y\x] 
and [x] denote the conditional distribution of y given x and the unconditional distribution 



of X, respectively. Following Wikle (2003), and assuming conditional independence, the 



joint posterior distribution of the catch intensity, zero inflation probability and parameters, 
conditional on the data can be obtained using Bayes theorem. 

To completely specify a Bayesian semiparametric BivZIP model, we need to provide 
prior distributions for all parameters. Specifically, for i = 1,2,3 and = {um, . . . , unKfJ', 
we choose the prior densities as f3ii ~ A^(0,(T^^.), uuk ~ N{0,al^J {k = 1, . . . , Ki-), and 
■jri ~ A^(0, cr^^J with the hyperparameters cr^^^ and cr^^^ assumed known and chosen by the 
practitioner. In addition, the prior for a^^^ is chosen such that a^^^ ~ IG(c, d), where IG(c, d) 
corresponds to an Inverse Gamma distribution with shape parameter c and scale parameter 
d. 

As previously noted, in order to fit our model, we take a Bayesian MCMC approach. 
Nevertheless, many of the full conditional distributions needed to carry out the estimation 
will not be of standard form and so more sophisticated MCMC methods will be required such 



as Metropolis within Gibbs (see Robert and Casella, 2004, for a comprehensive overview). 
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Full details surrounding the full conditionals and MCMC algorithm are provided in the 
Appendix. 



4 Application to Modeling Catch Per Unit Area 
4.1 Model of abundance 

In order to model CPUA for multiple species in the Missouri River benthic fish study we use 
the semiparametric BivZIP models proposed in Section [3j As alluded to, these models at- 
tempt to simultaneously account for both sources of zeros present in our data, namely "sam- 
pling" and "structural" zeros, by using indicator variables corresponding to gears, macrohab- 
itats and segments as covariates for describing the zero-inflation probability. The covariates 
include: four different gears, including benthic trawl (BT), drifting trammel net (DTN), 
beach seine (BS), and electrofishing (EF), where EF is considered as a baseline (i.e., set to 
zero); four macrohabitats, including tributary mouth (TRM), secondary channel-connected 
(SCC), secondary channel not-connected (SCN), and Bend with TRM taken as the baseline, 
and three years (1996-1998) with 1998 set as baseline. Of the existing 15 segments in this 
study, due to extreme sparsity (> 95% zeros), only ten segments (3, 5, 9, 15, 17, 19, 22, 23, 
25, 27) were included in the analysis with segment 27 set as the baseline. Another variable 
used in the model describes the substrate composition (proportion of sand, gravel, and silt), 
where proportion silt is omitted from the model (since the proportions of silt, sand, and 
gravel are constrained to sum to one). Note that, aside from interpretation, the choice of 
baseline is arbitrary and has no effect on the analysis. 



Additionally, similar to Wildhaber et al. (2011), combined with extensive explanatory 



analysis, several continuous variables are considered in the model, including, depth, water 
temperature, conductivity, turbidity (log turbidity) and velocity. Ultimately, water tem- 
perature, depth, conductivity and log turbidity are candidates to be modeled nonlinearly, 
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whereas velocity is excluded from the model due to high correlation with depth. 

The fish counts for channel catfish (Species 1) and common carp (Species 2) are assumed 
to follow a BivZIP distribution, as described in Section [3| More specifically, we formulate a 
semiparametric hierarchical Bayesian model starting with the assumption that (Yij,y2j)' ~ 
BwZlF{ajXij,ajX2j,ajXsj,Pij,P2j,P3j), where aj accounts for the different areas covered by 
gears (or "level of effort") involved in measurement j [j = 1, . . . ,n). The "normalization" 
by level of effort is important and allows us to model catch per unit area data obtained 
by multiple gears. Without this normalization, the models would only apply to the case of 
single gears. Now, for i = 1,2,3, corresponding to Species 1, Species 2, and the common 
process, and for r = 1, 2, 3 let 

log(A^j) = Pi^igeaTj + /3£^2segment^- + /^^^smacrohabj + (3e^^yea.rj + /3^^5substratej 

+ fifi{depthj) + fej{temp-) + fe^si^turhj) + /£,9(conductj) + Eij, (10) 

\og{prj/poj) = 7rO + Trigear^- + 7r2segmentj. + 7r3niacrohabj + 7r4year^. 

+ 7r5substratej. (11) 



Note that, taken together, (10) and (11) are considered the "full" model; however, in general, 
each of the log-intensity models may have different covariates and semiparametric specifica- 
tions. As described below, several variants of this model are considered in our analysis and 
are detailed in Table [U 



Although in (10) and (11) it is relatively straightforward to include interaction terms for 



different subsets of the covariates (see Ruppert et al. , 2003); in practice, this requires careful 



monitoring. Specifically, in the zero-infiated case, sparsity is more likely to occur within a 
given level of an interaction term, which may cause problems with estimation due to lack of 
information. Based on sparsity considerations and the fact that, in our case, models without 



interaction terms exhibited lower deviance information criteria (DIG) values ( Spiegelhalter 



et al. , 2002), the models presented here only consider main effects. 
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The smooth functions fei{-) in (10) are based on thin-plate sphnes, as defined by (|5]), with 
20 knot points equally spaced in the covariate domain. Note that several of the covariates 
are categorical, which complicates interpretation of the intercepts. The coefficients of the 
model corresponding to a specific level of a categorical variable can be roughly interpreted 
as the log "mean" fish CPUA in that specific level, relative to the baseline level (the level 
set to zero), while holding all other variables fixed. 

For £ = 1, 2, 3, we define relatively noninformative prior densities for the unknown pa- 
rameters f3ei and ■jri as A^(0, 100), and for ua parameters we define the prior density as 
A^(0, 0"^^.). Additionally, the prior for cr^^. is chosen as cr^^. ~ lG(c, d) with c and d specified 
such that the prior mean and variance are equal to .001 and 100, respectively. Note that, 
in all cases, the priors chosen are vague but proper and thus, we maintain propriety of the 
posterior while imparting little impact on the analysis. 

The model we propose for this application is well suited, since the species are naturally 
linked together for each observation. Specifically, each observation defines a specific gear 
deployment over which the count for both species is recorded. In particular, these species 
are biologically related and thus it is expected that the CPUA for each observation, as a 
function of macrohabitat, year, gear (i.e., type of net) used to catch the fish and segment will 
be correlated. Also, note that we do not consider the problem of estimating true abundance 
and probability of detection in our analysis, instead, we limit our work to modeling relative 
abundance due to lack of information on gear efficiencies (i.e., detection probability) and 
repeated sampling. However, there are approaches available based on repeated sampling 



(e.g., capture-recapture) to estimate true abundance and detection probabihties (see Royle 



and Dorazio (2008) for a complete discussion) and can be implemented in a similar manner 
to our proposed modeling framework. 

The MCMC computations were implemented using Gibbs and Metropolis-Hastings within 
Gibbs sampling algorithms (see the Appendix). A total of 120,000 MCMC realizations were 
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obtained with 20,000 iterations discarded for burn- in. Subsequently, every fifth sample was 
kept for inference, resulting in 20,000 iterations total. Convergence of the MCMC chains 
was verified through visual inspection of trace plots of the sample chains. 

4.2 Model Selection 

Aside from models having interaction terms, we compared the performance of seven different 
models using DIG. Specifically, the models considered differed only in the form of their 
continuous covariates for log(turbidity), depth, water temperature, and conductivity (e.g., 
whether these covariates entered into the model linearly or nonlinear ly) . These models were 
selected mainly based on subject matter considerations as well as exploratory data analysis. 
Table [T] provides a comprehensive overview of the models being compared. 

Selection and inference based on these models helps foster a better understanding of 
specific linear or nonlinear effects that the covariates have on the common and individual 
Poisson log-intensities. Model 1 (Ml) contains nonlinear specifications for all of the contin- 
uous covariates in all three log-intensity models (i.e.. Ml is the "full" model). Conversely, 
Model 2 (M2) has nonlinear specifications for log(turbidity) and depth in the individual 
log-intensity models and has all linear covariates for the common log-intensity model. That 
is, this model assumes that the common effects of these two covariates are linear and the 
nonlinear effects only arise for the individual processes. Model 3 (M3) is similar to M2; how- 
ever, the linear terms for water temperature and conductivity which were not statistically 
significant in M2, were excluded. Model 4 (M4) is similar to M3; however, all the variables 
that were not statistically significant in M3 were excluded. The variables that were excluded 
for individual log-intensity models are year and substrates (sand and gravel), whereas for 
the common log-intensity process the variables are substrates, log(turbidity), depth, water 
temperature and conductivity. Model 5 (M5) is nonlinear in all of the continuous covariates 
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for the individual log-intensity models and linear in all of the covariates for the common 
log-intensity model. Model 6 (M6) is nonlinear for all continuous variables for the common 
log-intensity model and has linear covariates for the individual log-intensity models. Finally, 
Model 7 (M7) has linear covariates for all continuous variables in all of the log-intensity 
models. Our selection criteria favor the most parsimonious model with the smallest DIG 
value. 



4.3 Results 



Based on the DIG values (see Table [T]), M3 is considered to be the "best" model among all 
the models considered; thus, we only discuss the results of M3 here. Recall, M3 does not 
include water temperature or conductivity and the only nonlinear effects, log(turbidity) and 
depth, arise only in the individual log-intensity models. 

Posterior means and 95% credible intervals (GI) for the log-intensity models are given in 
Table [2j A covariate is considered to have a "significant" effect if its credible interval does 
not include 0. Note that for indicator variables corresponding to the levels of categorical 
variables, results are based on comparison of the variable relative to the baseline. Our 
model results for the log-intensity models (Table [2]) corroborate the univariate results of the 
previous analyses conducted by Berry et al.| ( |2005[ ) and Wildhaber et al. (2011) . However, 
our model also provides results for the common process; e.g., information about common 



habitat usage for both species. In contrast, the previous analyses of Berry et al. (2005), 



Arab et al. (2008) and Wildhaber et al. (2011) are incapable of delivering this information 



m a concise manner. 

Based on the results for the common process, the mean GPUA for both species is higher 
in BEND macrohabitats compared to TRM (tributary mouth). Also, mean GPUA for both 
species increased in year 1998 compared to year 1996. Electrofishing is best when considering 
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both species (although not the best gear for samphng channel catfish only). Importantly, 
this information provides fisheries biologists with better understanding about the common 
habitat usage and distribution of the two species in the Missouri River. 

Of particular interest in this analysis is the potential nonlinear effect of log(turbidity) and 
depth on CPUA for these two species. As previously noted, channel catfish and common carp 
are both generalist species and thus tend to exhibit similar habitat selection. Our results 



corroborate the findings of Wildhaber et al. (2011) but provide more detailed information 



about the effects of depth and turbidity on the relative abundance of these two species 
(Figure |2]). Specifically, a nonlinear relationship between log(turbidity) and the BivZlP log- 
intensity parameters is present for both species (Figure [2] (a) and (b)). In particular, these 
figures demonstrate that the mean CPUA of channel catfish increases with higher turbidity 
whereas the mean CPUA of common carp was highest around mid-range levels of turbidity 
but decreased for higher turbidity values. Similarly, depth is nonlinearly related to the 
BivZIP log-intensity parameters for channel catfish with peaks around lower depth levels (1 
to 3 meters; Figure [2] (c)). However, given the relatively higher uncertainty in the nonlinear 
plot of depth for common carp (Figure [2] (d)), there does not seem to be strong evidence 
to support a potential nonlinear effect of depth on the log-intensity parameter for common 
carp. 

Finally, several factors significantly impact the zero-infiation probabilities for both 
species. For the sake of brevity, discussion is limited to the most salient results, with a 
comprehensive list provided in Table |3j The multinomial logit model on probabilities corre- 
sponds to the log ratio of pi, p2, and ps relative to the baseline probability po, which is the 
probability of observing zero values for both species. For the model corresponding to pi, the 
probability that channel catfish data is non-zero and common carp is zero, the only statisti- 
cally significant (positive) effect, relative to the baseline (segment 27), is segment 15, which 
is also significantly higher than segments 19 and 25, relative to the baseline. For the model 
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corresponding to p2, the probability that channel catfish observation is zero and common 
carp is non-zero, the only significant effect, relative to the baseline (TRM), is BEND, which 
is significantly less than zero. Finally, for the model corresponding to ps, the probability that 
observations for both species are non-zero, relative to the baseline, segment 9 is significantly 
greater than zero and significantly higher than segments 17 and 19, whereas drifting trammel 
net is significantly lower than the baseline (electrofishing) and all other gears (relative to the 
basehne) . 

It is important to stress that these findings are based on a bivariatc model and provide 
information that was unobtainable in previous analyses of these data. The environmental 
variables were not included in the final multinomial logit models presented for the probabil- 
ities, due to the fact that a "fulF' model including these variables (not shown) exhibited no 
significant effects (for these variables). These findings provide useful information surround- 
ing the presence and absence of both species based on environmental attributes of the river 
as well as sampling procedures (i.e., gears). Ultimately this information is crucial to fish- 
eries biologists interested in optimizing management efforts and designing future monitoring 
programs. 

5 Conclusion 

Ecological and environmental processes often produce count data having a high proportion 
of zeros. Typically these processes are related to externally measured explanatory variables 
through complex nonlinear relationships. Furthermore, when developing models of species 
abundance it is often advantageous to allow correlation among the response variable for 
multiple populations under consideration. We propose an extremely fiexible hierarchical 
Bayesian semiparametric BivZlP model and note that this model can be utilized across a 
diverse range of applications including those outside the natural sciences. Moreover, our 
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approach represents the first attempt at semiparametric modehng of bivariate zero-inflated 
counts. The effectiveness of our approach is demonstrated through a motivating application 
to modeling fish catch per unit area based on observations from two closely relates species 
of benthic fish in the Missouri River. 

The example presented here illustrates the utility of our model for drawing inference 
on complex ecological data. In particular, our model accommodates inherent nonlinear 
relationships, while allowing for the high proportion of zeros in the data. In several instances 
we are able to provide inference where traditional modeling approaches were unsuccessful. 
For instance, despite the fact that each species exhibited a large percentage of zero catches, 
we were able to find a significant effect due to gears. Furthermore, our approach was able to 
show, with statistical significance, higher catch for high log(turbidity) values and higher catch 
for low to mid-range depth values for channel catfish and low to mid-range log(turbidity) 
values for common carp. 

Our model borrows strength across correlated species, which is fundamentally important 
for successful modeling of abundance of multiple species. One potential practical limitation 
of any bivariate (or multivariate) approach, including ours, is that by pairing data for two 
or more species the number of cases that may have to be excluded from the model due to 
extreme sparsity or missing data increases. Nonetheless, in our analysis, we are still able to 
exclude fewer observations than would be necessary using standard statistical methods. In 
addition, using our approach, missing data can be readily dealt with using data imputation 
techniques, which are particularly easy to implement in the Bayesian framework. Finally, 
our method can be adapted to further avoid excluding data arising from sparsity due to 
low catch rates. In this case, science based prior distributions or data from previous and/or 
similar studies may be readily incorporated into the model hierarchy. 
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Appendix: Conditional Distributions for MCMC 

In this section, we describe the MCMC algorithm for Model 3 (M3) and note that the 
migration to other models is analogous. Recall, M3 is given by 

log{Xij) = Pi^igeaTj + /3£,2 segment^. + /J^^smacrohab^- + Pi^^year- + /J^^ssubstrate^ 

+ fe,e{deipthj) + fej{ltVLrhj)+eij, (A.l) 

log{prj/poj) = 7r0 + 7rigear^- + 7r2segmentj- + 7^3macrohabj + 7^4year^- 

+ 7r5substratej, (A. 2) 

for £ = 1,2,3, r = 1,2,3, and j — l,...,n. Using Bayes' theorem, and assuming condi- 
tional independence, the posterior distribution of the processes and parameters given the 

observations can be expressed as 

[Ai,A2, A3, Bi, B2, B3, 7i, 72, 73) Ui,iturb, Ui,depth, U2,iturb, U2,d6pth, CTui.lturb; ^«i,depth) C^U2,lturb> <^U2,depth) 
^£1' <^£2' '^^al'^l' "^2] OC [Yi, Y2IA1, A2, A3, 7l, 72, 73][Ai|Bi, Ui,iturb, Ui,depth, C^ei , Cr^i,lturb> C^wi.depth] 
X [A2IB2, U2,lturb, U2,depth, (T^^: <,lturb> ^.depth] [^3 1 B3, (^^^3] [7l] [72] [Ts] Pl] [B2] [B3] [Ui] [U2] 
^ ["^£2] ['^£3] ['^ui.lturbl ["^ui, depth] ["^^2,1*11^] ['^U2, depth] ) 

where B^ denotes the vector of coefficients associated with the linear effects in each log- 
intensity model. 

This complicated posterior distribution can be numerically evaluated using MCMC methods 
and, in particular, Gibbs sampling based on full-conditional distributions of the unknown 
processes and parameters. Also, Metropohs-Hastings (M-H) steps within the Gibbs algo- 
rithm are required for simulation of A^ = log(A^) {i = 1, 2, 3) and 7^. (r = 1, 2, 3). 

The full-conditional distributions and sampling steps are: 
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Generate the latent process (Z3). For j — 1, . . . ,n, draw each element of Z3 (i.e., Z^j) 
from a DUnif{0, min(Yij, 1^2^)} , then set 



Zi = Yi - Z3 and Z2 = Y2 - Z3. 

[A^l-] (^ = 1,2) 
M-H step: 

~ * ~ (i— 1) 

1. Generate a candidate ~ N(\ i &t^) the ith MCMC iteration (where 9(, is a 
tuning parameter chosen such that the acceptance rate for the M-H algorithm is between 
20% and 40%), and compute the ratio 

fV V rr2'(*-l) 2,{i-l) . 

~ rV V 2,(i-l) 2,(i-l) n' 

where = (B^, u'^^^^^^^, K^eptJ- 
2. Set A^*^ = X}j {j = 1, . . . ,n) with probability min(i?j,l); else set A^*^ = A^* 

[Aal-] 

M-H step: 

— * — (^— 1) 

1. Generate a candidate A3 ~ A^(A3 , 63I) at the ith MCMC iteration (where 63 is the 
tuning parameter chosen such that the acceptance rate for the M-H algorithm is between 
20% and 40%), and compute the ratio 

[Yi.Y,|A;||A;|Br",4;'-'| 

2. Set Ag^- = Xlj with probability min(i?j,l); else set Ag*- = Ag^- ^\ 

hr\-] (r=l,2,3.) 

M-H step: 
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1. Generate a candidate 7* ~ N('yi^ ^\ 6rT) at the ith MCMC iteration, and compute 
the ratio 

[Yi,Y2|Ai,A2,A3,7:][7:] 
[Yl,Y2|^,A2,A3,7^'^][7^'V 

2. Set 7^*^ = 7*^- {j = 1, . . . ,n) with probabihty min(i?j-,l); else set 7^*^ = 7^}"^^- 

Next, the zero-inflation probabihties are derived based on 7^s at each iteration. Specif- 
ically, Po = 1/D and = exp(X^7^)/D, and D = 1 Yll=i exp(X-^7^). 
where pg and p^ (r = 1,2,3) denote the vectors of probabilities and denotes the 
design matrix for the multinomial logit regression models. 

• Update the regression coefficients and spline coefficients jointly: oc [A^l 

with, 

^Ui^l %,2 ^£€,0 

/ ~ ^se -1 

where Ei = diag(0^, 1^^, 0«J, E2 = diag(0^, 0^^, = diag(l^, 02x«J, rn de- 

notes the number of linear effects and = 20 is the number of knot locations. 




[B3|-]a[A3|B3][B3] 

B3|-~iV(Ab,A), 
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with, 



where. 



X3X3 + 



£3 



£3,0 



b = X3A3 H — |^B3_o. 



£3,0 



1 



•St 



where i — 1, 2, 3 (note that for £ = 3, $3 = B3 and C3 = X3), and the prior distribu- 
tion for [(Tg ] is chosen as IG(c£^,o, ^^q,o)- 



• [<|-]oc[A,|<][<] 

f^il- ~ IG(c„,(i„) 

where, 

du = (-7^ h O.Su^u^ 

\<J'ue,0 

where the prior distribution for [a^J is chosen as lG{cuffl, du^fl). Note that for each of 
the nonhnear effects, log (turbidity) and depth, this has to be done separately; however, 
for brevity, we only describe the procedure for the variance component of one of the 
nonlinear effects. 




23 



References 



Arab, A. (2007). "Hierarchical Spatio- Temporal Models for Environmental Processes." Un- 
published Ph.D. Dissertation, University of Missouri, Department of Statistics. 

Arab, A., Wildhaber, M., Wiklc, C, and Gentry, C. (2008). "Zcro-lnflatcd Modeling of 
Fish Catch per Unit Area Resulting from Multiple Gears: Application to Channel Catfish 
and Shovelnose Sturgeon in the Missouri River." North American Journal of Fisheries 
Management, 28, 4, 1044-1058. 

Berry, C, Wildhaber, M., and Galat, D. (2005). "Fish Distribution and Abundance. Pop- 
ulation Structure and Habitat Use of Benthic Fishes Along the Missouri and Lower Yel- 
lowstone." Tech. rep.. United States Geological Survey. 

Berry, C. and Young, B. (2001). "Introduction to the Benthic Fishes Studies. Population 
Structure and Habitat Use of Benthic Fishes Along the Missouri and Lower Yellowstone 
Rivers." Tech. rep.. United States Geological Survey. 

Breslow, N. and Clayton, D. (1993). "Approximate Inference in Generalized Linear Mixed 
Models." Journal of the American Statistical Association, 88, 421, 9-25. 

Brumback, B., Ruppert, D., and Wand, M. (1999). "Variable Selection and Function Es- 
timation in Additive Nonparamctric Regression Using a Data-Based Prior: Comment." 
Journal of the American Statistical Association, 94, 447, 794-797. 

Chiogna, M. and Gaetan, C. (2007). "Semiparametric Zero-Inflated Poisson Models with 
Application to Animal Abundance Studies." Environmetrics , 18, 3, 303-314. 

Clarke, K. and Green, R. (1988). "Statistical Design and Analysis for a "Biological Effects" 
Study." Marine Ecology Progress Series, 46, 1, 213-226. 

Cohen, A. (1963). "Estimation in Mixtures of Discrete Distributions." In Proceedings of the 
International Symposium on Discrete Distributions, Montreal, 373-378. 

Crainiceanu, C, Ruppert, D., and Wand, M. (2005). "Bayesian Analysis for Penalized Sphne 
Regression Using WinBUGS." Journal of Statistical Software, 14, 14, 1-24. 

Cressie, N. and Wikle, C. (2011). Statistics for Spatio- Temporal Data. John Wiley and Sons. 

Dagne, G. (2010). "Bayesian Semiparametric Zero-Inflated Poisson Model for Longitudinal 
Count Data." Mathematical Biosciences, 224, 2, 126-130. 

Fahrmeir, L. and Echavarria, L. (2006). "Structured Additive Regression for Overdispersed 
and Zero-Inflated Count Data." Applied Stochastic Models in Business and Industry, 22, 
4, 351-369. 

Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized 
Linear Models. Springer Verlag. 



24 



Gimenez, O., Crainiceanu, C, Barbraud, C, Jenouvrier, S., and Morgan, B. (2006). "Semi- 
parametric Regression in Capture-Recapture Modeling." Biometrics, 62, 3, 691-698. 

Hall, D. (2000). "Zero- Inflated Poisson and Binomial Regression with Random Effects: A 
Case Study." Biometrics, 56, 4, 1030-1039. 

Holan, S., Wang, S., Arab, A., Sadler, E., and Stone, K. (2008). "Semiparametric Geograph- 
ically Weighted Response Curves with Apphcation to Site-Specific Agriculture." Journal 
of Agricultural, Biological, and Environmental Statistics, 13, 4, 424-439. 

Johnson, N., Kotz, S., and Balakrishnan, N. (1997). Discrete Multivariate Distributions, vol. 
157. Wiley New York. 

Kammann, E. and Wand, M. (2003). "Geoadditive Models." Journal of the Royal Statistical 
Society: Series C (Applied Statistics), 52, 1, 1-18. 

Kocherlakota, S. and Kocherlakota, K. (1992). Bivariate Discrete Distributions. Chapman 
& Hall/CRC. 

Lam, K., Xue, H., and Bun Cheung, Y. (2006). "Semiparametric Analysis of Zero- Inflated 
Count Data." Biometrics, 62, 4, 996-1003. 

Lambert, D. (1992). "Zero-Inflated Poisson Regression, with an Application to Defects in 
Manufacturing." Technometrics , 34, 1, 1-14. 

Li, C, Lu, J., Park, J., Kim, K., Brinkley, P., and Peterson, J. (1999). "Multivariate 
Zero-Inflated Poisson Models and Their Applications." Technometrics , 41, 1, 29-38. 

Liu, H., Ciannelli, L., Decker, M., Ladd, C, and Chan, K. (2010). "Nonparametric Thresh- 
old Model of Zero-Inflated Spatio- Temporal Data with Application to Shifts in Jellyfish 
Distribution." Journal of Agricultural, Biological and Environmental Statistics, 1-17. 

Majumdar, A., Cries, C, Walker, J., and Grimm, N. (2010). "Bivariate Zero-Inflated Re- 
gression for Count Data: A Bayesian Approach with Application to Plant Counts." The 
International Journal of Bio statistics, 6, 1. 

Martin, T., Wintle, B., Rhodes, J., Kuhnert, P., Field, S., Low-Choy, S., Tyre, A., and 
Possingham, H. (2005). "Zero Tolerance Ecology: Improving Ecological Inference by 
Modelling the Source of Zero Observations." Ecology Letters, 8, 11, 1235-1246. 

Minami, M., Lennert-Cody C, Gao, W., and Roman- Verdesoto, M. (2007). "Modehng 
Shark by Catch: The Zero-Inflated Negative Binomial Regression Model with Smoothing." 
Fisheries Research, 84, 2, 210-221. 

Musio, M., Sauleau, E., and Buemi, A. (2010). "Bayesian Semi-Parametric ZIP Models 
with Space-Time Interactions: An Apphcation to Cancer Registry Data." Mathematical 
Medicine and Biology, 27, 2, 181. 



25 



Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS . Wiley: Hoboken, New Jersey. 

Robert, C. and Casclla, G. (2004). Monte Carlo Statistical Methods. Springer- Ver lag. 

Royle, J. and Dorazio, R. (2008). Hierarchical Modeling and Inference in Ecology: The 
Analysis of Data from Populations, Metapopulations and Communities. Academic Press. 

Ruppert, D., Wand, M., and Carroll, R. (2003). Semiparametric Regression. Cambridge 
University Press. 

— (2009). "Semiparametric Regression During 2003-2007." Electronic Journal of Statistics, 
3, 1193. 

Schmidt, A. and Rodriguez, M. (2011). "Modelling Multivariate Counts Varying Contin- 
uously in Space." In Bayesian Statistics 9, eds. J. Bernardo, M. Bayarri, J. Bcrgcr, 
A. Dawid, D. Heckerman, S. A. F. M., and M. West. Oxford: Oxford University Press. 

Skellam, J. (1952). "Studies in Statistical Ecology: Spatial Pattern." Biometrika, 39, 3-4, 
346. 

Spiegellialter, D., Best, N., Carlin, B., and Van Der Lindc, A. (2002). "Bayesian Measures of 
Model Complexity and Fit." Journal of the Royal Statistical Society: Series B (Statistical 
Methodology), 64, 4, 583-639. 

Tsionas, E. (2001). "Bayesian Multivariate Poisson Regression." Communications in 
Statistics- Theory and Methods, 30, 2, 243-255. 

Ver Hoef, J. and Janscn, J. (2007). "Space - Time Zero-Inflated Count Models of Harbor 
Seals." Environmetrics , 18, 7, 697-712. 

Wang, K., Yau, K., and Lee, A. (2002). "A Zero-Inflated Poisson Mixed Model to Analyze 
Diagnosis Related Groups with Majority of Same-Day Hospital Stays." Computer Methods 
and Programs in Biomedicine, 68, 3, 195-203. 

Welsh, A., Cunningham, R., Donnelly, C, and Lindenmayer, D. (1996). "Modelling the 
Abundance of Rare Species: Statistical Models for Counts with Extra Zeros." Ecological 
Modelling, 88, 1-3, 297-308. 

Wikle, C. (2003). "Hierarchical Models in Environmental Science." International Statistical 
Review, 71, 2, 181-199. 

Wikle, C. and Anderson, C. (2003). "Climatological Analysis of Tornado Report Counts 
Using a Hierarchical Bayesian Spatio-temporal Model." Journal of Geophysical Research- 
Atmospheres, 108, D24, 9005, doi:10.1029/2002JD002806. 

Wildhaber, M. L., Gladish, D., and Arab, A. (2011). "Distribution And Habitat Use Of 
The Missouri River And Lower Yellowstone River Bcnthic Fishes From 1996 To 1998: A 
Baseline For Fish Community Recovery." River Research and Applications , In Press. 



26 



Zhao, Y., Staudenmayer, J., CouU, B., and Wand, M. (2006). "General Design Bayesian 
Generalized Linear Mixed Models." Statistical Science, 21, 1, 35-51. 



27 



Model 


Description 


DIG 






Linear 


Nonhnear 




Ml 


Ai 


segment; macrohab; substrate; gear; year 


temp; depth; hurb; conduct 


385, 260 




A2 


segment; macrohab; substrate; gear; year 


temp; depth; hurb; conduct 






A3 


segment; macrohab; substrate; gear; year 


temp; depth; Iturb; conduct 




M2 


Ai 


segment; maerohab; substrate; gear; year; condiict; temp 


hurb; depth; 


385, 440 




A2 


segment; macrohab; substrate; gear; year; conduct; temp 


hurb; depth; 






A3 


segment; macrohab; substrate; gear; year; conduct; temp 
hurb; depth 


- 




M3 


Ai 


segment; macrohab; substrate; gear; year; 


hurb; depth 


384, 850 




A2 


segment; macrohab; substrate; gear; year; 


hurb; depth 






A3 


segment; macrohab; substrate; gear; year; Iturb; depth 


- 




M4 


Ai 


segment; macrohab; gear 


hurb; depth 


386, 670 




A2 


segment; macrohab; gear 


hurb; depth; 






A3 


segment; macrohab; gear; year 






M5 


Ai 


segment; macrohab; substrate; gear; year 


temp; depth; hurb; conduct 


385, 110 




A2 


segment; macrohab; substrate; gear; year 


temp; depth; hurb; conduct 






A3 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 






M6 


Ai 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 


- 


387, 230 




A2 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 


- 






A3 


segment; macrohab; substrate; gear; year; 


temp; depth; hurb; conduct 




M7 


Ai 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 




387, 630 




A2 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 








A3 


segment; macrohab; substrate; gear; year; temp; depth; 
Iturb; conduct 







Table 1: For each log- intensity model, linear and nonlinear covariates are identified. The fol- 
lowing notation was used in the table: lturb= log (turbidity), temp=water temperature, con- 
duct=conductivity, and macrohab=macrohabi^ats. 



Par am. 


channel catfish 
Posterior Mean (95% CI) 


common carp 
Posterior Mean (95% CI) 


common process 
Posterior Mean (95% CI) 


Seg3 


-2.8977 (-4.0473, -1.6142) 


0.5730 (-0.1712, 1.3153) 


-0.6527 (-1.6418, 0.3045) 


Seg5 


-2.5796 (-3.4236, -1.7467) 


0.1924 (-0.4510, 0.8286) 


-0.7982 (-1.6160, 0.0006) 


Seg9 


-1.6931 (-2.3611, -1.0417) 


0.2243 (-0.4124, 0.8627) 


-0.7336 (-1.6109, 0.1172) 


Segl5 


-1.7297 (-2.4873, -0.9597) 


0.2340 (-0.4479, 0.9211) 


-0.7650 (-1.6191, 0.0357) 


Segl7 


0.0624 (-0.7645, 0.8735) 


-1.0825 (-1.7457, -0.4198) 


-1.8191 (-2.7630, -0.9727) 


Segl9 


-0.4098 (-1.2070, 0.3949) 


-0.8706 (-1.4945, -0.2428) 


-1.0380 (-1.7323, -0.3670) 


Seg22 


0.4423 (-0.1653, 1.0605) 


-0.8797 (-1.4709, -0.2888) 


-0.4698 (-1.1191, 0.1501) 


Seg23 


0.1924 (-0.3921, 0.7714) 


-0.4901 (-1.0620, 0.0625) 


-0.2710 (-0.8857, 0.3305) 


Seg25 


-0.5881 (-1.1454, -0.0224) 


0.1891 (-0.3409, 0.7235) 


-0.7966 (-1.4429, -0.1762) 


Year96 


-0.4027 (-0.8393, 0.0365) 


-0.2338 (-0.6302, 0.1581) 


-1.0172 (-1.5447, -0.4924) 


Yeara i 


U.U/Oo (-U.oiiz, UAbZl) 


-U.iooo (-U.4D/0, U.iyUoj 


n QQ1Q { n TQQ/i n ^K'7^^^ 
-U.ooio (-U. /oo4, (J.Uomj 


BT 


1.1022 (0.4357, 1.7808) 


-4.0910 (-5.3555, -2.4229) 


-3.9054 (-5.1592, -2.8494) 


BS 


4.4361 (3.7535, 5.1360) 


0.2288 (-0.4262, 0.8774) 


-2.6593 (-3.4105, -1.9717) 


DTN 


-2.5867 (-3.4899, -1.6776) 


-4.6965 (-5.6991, -3.7468) 


-4.0462 (-5.2319, -2.9813) 


BEND 


0.8567 (0.2506, 1.4599) 


-0.7176 (-1.1309, -0.2940) 


0.7602 (0.2961, 1.2191) 


sec 


0.3071 (-0.3347, 0.9511) 


-0.9172 (-1.4212, -0.4202) 


0.1124 (-0.4598, 0.6776) 


SCN 


-0.0788 (-0.9383, 0.7769) 


0.0417 (-0.5926, 0.6756) 


-0.1039 (-1.0000, 0.7414) 


Gravel 


-0.1088 (-0.8505, 0.6157) 


-0.1888 (-0.8079, 0.4266) 


-0.2771 (-1.1363, 0.5704) 


Sand 


-0.0093 (-0.4739, 0.4482) 


-0.3235 (-0.7265, 0.0806) 


-0.4431 (-0.9843, 0.0815) 



Table 2: Posterior mean and 95% Credible Intervals (CI's) for the log-intensity model 
parameters for channel catfish, common carp, and the common process. 



Par am. 


Pi 

Posterior Mean (95% CI) 


P2 

Posterior Mean (95% CI) 


P3 

Posterior Mean (95% CI) 


intercept 


-4.7028 (-9.0578, 0.0402) 


1.4526 (-2.5284, 5.1261) 


6.8226 (3.5612, 10.5133) 


Seg3 


-0.8611 (-6.6299, 3.2029) 


-0.0640 (-6.6842, 5.3230) 


2.3359 (-0.8227, 5.4368) 


Scg5 


2.0933 (-1.5951, 4.9540) 


2.3959 (-2.1738, 6.1621) 


1.3283 (-1.9495, 4.7725) 


Seg9 


3.3999 (-2.3802, 6.9207) 


-0.9648 (-7.2835, 4.2919) 


5.2317 (1.9344, 9.1258) 


Scgl5 


5.8175 ( 2.9501, 8.9167) 


-0.5501 (-6.5833, 4.3153) 


1.4903 (-2.3358, 4.9361) 


Segl7 


0.5610 (-2.8393, 3.5829) 


2.2280 (-0.9609, 5.8953) 


-2.5977 (-5.7459, 0.5701) 


Segl9 


-2.4769 (-7.4011, 1.3635) 


0.6779 (-2.7600, 4.1631) 


-1.8595 (-4.6993, 0.8400) 


Seg22 


0.7295 (-3.3883, 3.5429) 


-0.5137 (-5.5076, 3.1892) 


1.7871 (-1.2141, 4.9684) 


Seg23 


1.5955 (-0.8711, 3.9552) 


0.2700 (-4.3055, 3.6490) 


0.2901 (-2.2090, 3.0414) 


Seg25 


-2.0397 (-6.4379, 1.4508) 


-1.4959 (-6.6522, 2.3610) 


0.4252 (-2.0641, 2.8578) 


Year96 


0.1058 (-2.7153, 2.3998) 


0.4619 (-2.9069, 3.4959) 


-0.8980 (-2.6601, 0.9159) 


Year97 


0.3047 (-1.4503, 2.2170) 


1.7342 (-0.7826, 4.9601) 


-1.1757 (-2.8134, 0.4413) 


BT 


3.1982 (-6.5612, 9.9808) 


-1.4747 (-7.1871, 3.4423) 


0.9154 (-2.7345, 5.5469) 


BS 


0.1692 (-6.5510, 6.4718) 


-0.9743 (-6.6982, 5.4512) 


2.8631 (-1.2833, 8.0389) 


DTN 


1.2121 (-2.7454, 5.7970) 


-1.5209 (-5.5394, 3.0484) 


-8.7597 (-11.7531, -6.3251) 


BEND 


0.2153 (-3.1624, 3.8066) 


-2.9688 (-6.0879, -0.1890) 


-1.0458 (-4.2099, 1.6557) 


sec 


-0.0985 (-4.1622, 4.2813) 


-3.7038 (-7.9150, 0.3265) 


0.4479 (-2.8572, 3.7453) 


SCN 


0.2414 (-6.2622, 6.3344) 


-1.1618 (-6.3401, 3.8653) 


1.5549 (-3.8720, 7.3595) 


Gravel 


0.3548 (-3.4301, 4.1525) 


-1.2147 (-6.1241, 2.9013) 


0.9386 (-2.0465, 4.2275) 


Sand 


0.4618 (-2.7649, 4.0411) 


-2.2307 (-7.4101, 1.3348) 


-0.0256 (-2.7653, 2.4096) 



Table 3: Posterior mean and 95% Credible Intervals (CPs) for the multinomial logit regres- 
sion model parameters for pi, p2 and ps with respect to the baseline probability po- 



<^Sturgeon Island ■ Beauchamp Coulee (km 3141.1-3029.3) 

(?) Milk River - Hwy 13 bridge (km 2831.8-2736.9) 



<3> Arrow Creek - BIr 



Missouri 
River 
Source 



h Creek (km 3217.0-3186.8 



Wolf Point, MT - Yellowstone River (km 2736.9 - 2545.4) 

^ Intake Diversion Dam - MO River Confluence (km 114.2 - 0.0) 
10) Yellowstone River - Lake Sakakawea headwaters (km 2S4S.4 - 2497.2) 



(12) Garrison Dam - Lake Oahe headwaters (km 2234.9 - 2098.1) 



(|4) Fort Randall Dam - Lewis and Clark Lake headwaters (km 1415.9 - 1343.5) 
@ Gavins Point Dam to Ponca, NE (km 1303.3 - 1211.6) 



SD J 

^ Lake Oahe 
■ (Oalie Dam) [ 




yfj Big Sioux River - Little Sioux River (km 1190.7 - 1076.7) 
19) Platte River - NIshnabotna River, NE (km 958.2 - 872.1) 
a) St. Joseph - Kansas River, MO (km 708.0 - 591.3) 
^ Kansas River to Grand River (km 591.3 - 402.2) 
aj Glasgow, MO - Osage River (km 354.0 - 209.8 

Ta I 



I (27/ km 80.5 to Mississippi 

River Confluence (km 80.5 - 0.0) 



' Vj St. Louis 



Figure 1: Missouri River Benthic Fishes Study area from Mon tana to the conflue nce of 
the Missouri River with the Mississippi River in Missouri, USA (Berry et aL (2005); (} = 
Least- Ahered (LA), Q = Inter- Reservoir (IR), O = Channehzed (CH) Segments)^ 



2 3 4 5 6 7 

log(turbidity) 



2 3 4 5 6 7 

log(turbidity) 




Figure 2: Posterior mean for the nonlinear functions in M3 (solid line) and point-wise 95% 
CIs (shaded area) for both species: (a) log(turbidity) (channel catfish), (b) log (turbidity) 
(common carp), (c) depth (channel catfish), and (d) depth (common carp). 



